Skip to content

Conversation

@nenad1002
Copy link
Contributor

No description provided.

namespace Microsoft.ML.OnnxRuntimeGenAI
{
public static class SignalProcessor
{
Copy link
Contributor

@kunal-vaishnavi kunal-vaishnavi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these APIs be added to the MultiModalProcessor instead of introducing a new class? All multi-modal models (e.g. Whisper, Phi-4 mm, etc) use that processor for pre-processing.


[DllImport(NativeLib.DllName, CallingConvention = CallingConvention.Winapi)]
public static extern int /* extError_t */ OgaSplitSignalSegments(
IntPtr /* const OgaTensor* */ input,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the same tab spacings as the other native method APIs listed in this file for the new APIs?

}

template <typename T>
OrtxTensor* MakeOrtxTensor(Generators::Tensor* src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the Generators namespace is included at the top, I think you can remove the Generators:: part before each Tensor.

namespace Generators {

const OgaTensor* hop_ms_tensor,
const OgaTensor* energy_threshold_db_tensor,
OgaTensor* output0) {
OGA_TRY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add unit tests for the new APIs in ORT GenAI's C API tests?

*/
OGA_EXPORT OgaResult* OGA_API_CALL OgaProcessorProcessImagesAndAudiosAndPrompts(const OgaMultiModalProcessor*, const OgaStringArray* prompts, const OgaImages* images, const OgaAudios* audios, OgaNamedTensors** input_tensors);

OGA_EXPORT OgaResult* OGA_API_CALL OgaSplitSignalSegments(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add C# API tests that call these C APIs?

const OgaTensor* energy_threshold_db_tensor,
OgaTensor* output0);

OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add comments above these new APIs to explain their usage and their parameters.


OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments(
const OgaTensor* segments_tensor,
const OgaTensor* merge_gap_ms_tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the type of the object is an OgaTensor*, I think we can remove the _tensor suffix from the parameter names.

reinterpret_cast<const Generators::Tensor*>(input),
reinterpret_cast<const Generators::Tensor*>(sr_tensor),
reinterpret_cast<const Generators::Tensor*>(frame_ms_tensor),
reinterpret_cast<const Generators::Tensor*>(hop_ms_tensor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To set the return value to output0 in a way that does not cause memory corruptions, you may need to use ReturnShared<OgaTensor> and change output0 to OgaTensor**.

OgaResult* OGA_API_CALL OgaTokenizerEncodeBatch(const OgaTokenizer* tokenizer, const char** strings, size_t count, OgaTensor** out) {
OGA_TRY
auto tensor = tokenizer->EncodeBatch(std::span<const char*>(strings, count));
*out = ReturnShared<OgaTensor>(tensor);
return nullptr;
OGA_CATCH
}

Let's also rename output0 to out since there's just one output and to be consistent with existing API conventions.

Copy link
Contributor Author

@nenad1002 nenad1002 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't cause memory corruption anymore, code I have here now is working correctly, so I wouldn't mess up again with changing it.

OgaTensor* output0) {
OGA_TRY
return reinterpret_cast<OgaResult*>(
Generators::MergeSignalSegments(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move SplitSignalSegments and MergeSignalSegments inside a processor object. The outputs from those methods can then be set to the output tensors from the processor APIs.

auto tensor = processor->SplitSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);
auto tensor = processor->MergeSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);

Then these new APIs could be called similar to how methods such as OgaTokenizerEncodeBatch look in the above review comment. This will also help add future support for these new APIs in other language bindings.

Copy link
Contributor Author

@nenad1002 nenad1002 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I have to construct a new object which has no connection with my current functionality? I think what we need to do here is to clarify, do we even need a Processor class, and if we do, what exactly should be inside it, because I see many methods not being a part of the class (e.g. ProcessTensor is not a part of the class but is within the same file).

I can do this, but we really need to clarify this.

{
long[] inputShape = new long[] { 1, inputSignal.Length };

input = CreateFloatTensorFromArray(inputSignal, inputShape);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you construct a Tensor object directly for the input data? Each Tensor object has its own IntPtr handle that you can provide for the NativeMethods.CAPI call.

public class Tensor : IDisposable
{
private IntPtr _tensorHandle;
private bool _disposed = false;
public Tensor(IntPtr data, Int64[] shape, ElementType type)
{
Result.VerifySuccess(NativeMethods.OgaCreateTensorFromBuffer(data, shape, (UIntPtr)shape.Length, type, out _tensorHandle));
}
internal Tensor(IntPtr tensorHandle)
{
Debug.Assert(tensorHandle != IntPtr.Zero);
_tensorHandle = tensorHandle;
_disposed = false;
}
internal IntPtr Handle { get { return _tensorHandle; } }

The Tensor class can help with memory management during disposal.

Here is an example with Tensor from a unit test.

public void TestTensorAndAddExtraInput()
{
string modelPath = _tinyRandomGpt2ModelPath;
using var model = new Model(modelPath);
Assert.NotNull(model);
using var generatorParams = new GeneratorParams(model);
Assert.NotNull(generatorParams);
float[] data = { 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24 };
long[] shape = { 3, 5 };
// Pin the array to get its pointer
GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned);
try
{
IntPtr data_pointer = handle.AddrOfPinnedObject();
using var tensor = new Tensor(data_pointer, shape, ElementType.float32);
Assert.NotNull(tensor);
Assert.Equal(shape, tensor.Shape());
Assert.Equal(ElementType.float32, tensor.Type());
using var generator = new Generator(model, generatorParams);
Assert.NotNull(generator);
generator.SetModelInput("test_input", tensor);
}
finally
{
handle.Free();
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this approach gave me memory corruption as well, there might be an issue with Tensor class, but I can give it another try.

const OrtxTensor* frame_tensor = Generators::MakeOrtxTensorConst<int64_t>(frame_ms_tensor);
const OrtxTensor* hop_tensor = Generators::MakeOrtxTensorConst<int64_t>(hop_ms_tensor);
const OrtxTensor* thr_tensor = Generators::MakeOrtxTensorConst<float>(energy_threshold_db_tensor);
OrtxTensor* out_tensor = Generators::MakeOrtxTensor<int64_t>(output0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend looking at how the tokenizer and processor APIs are currently implemented (ex: Process method in the Whisper processor). That will help with making the implementations of the two methods similar to existing implementations.

  1. You can use CheckResult in place of the err check.

  2. Are tensors really needed for most of the parameters passed to OrtxSplitSignalSegments? The sampling rate (sr_tensor), frame size (frame_ms), hop length (hop_ms), and energy threshold (energy_threshold_db) are singular values. Can we modify OrtxSplitSignalSegments to use primitive types so that primitive types can be used in ORT GenAI for the "language binding API --> ORT GenAI C API --> ORT GenAI C++ API" flow? That will also greatly clean up the need for all of the workarounds you have in the C# bindings and in this file.

  3. If OrtxSplitSignalSegments is updated, you can have the output stored as OrtxTensorResult** result inside that extensions API. Then you can create the output tensor inside this method instead of passing it in as a parameter to SplitSignalSegments. Here is an example.

ort_extensions::OrtxObjectPtr<OrtxTensorResult> result;
CheckResult(OrtxFeatureExtraction(processor_.get(), audios->audios_.get(), result.ToBeAssigned()));
ort_extensions::OrtxObjectPtr<OrtxTensor> mel;
CheckResult(OrtxTensorResultGetAt(result.get(), 0, mel.ToBeAssigned()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Its more consistant having tensors, but I can do your approach as well, I have no preference, I was just trying to be consistant + I haven't seen built-in types used for Oga API.

  2. It is possible, yes, but since I am initializing tensors on C# side, again for consistancy and simplicity (no need for **), I decided to create all on the caller side. But the extension should, in theory, work for both cases.

@nenad1002
Copy link
Contributor Author

Thanks for reviewing the PR @kunal-vaishnavi , I am currently still testing the changes from here with Audio Streaming, and will for now only address the comments that are fast to resolve, while others will follow later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants